Problem 1 : Discuss this example with your class mates and explain what is hapening and comment on the results.

Throwing Dice as Multinomial Distribution

A distribution that shows the likelihood of the possible results of a experiment with repeated trials in which each trial can result in a specified number of outcomes that is greater than two. A multinomial distribution could show the results of tossing a dice, because a dice can land on one of six possible values. By contrast, the results of a coin toss would be shown using a binomial distribution because there are only two possible results of each toss, heads or tails.

Two additional key characteristics of a multinomial distribution are that the trials it illustrates must be independent (e.g., in the dice experiment, rolling a five does not have any impact on the number that will be rolled next) and the probability of each possible result must be constant (e.g., on each roll, there is a one in six chance of any number on the die coming up).

Rolling a die N=100 times

one.dice <- function(){
  dice <- sample(1:6, size = 1, replace = TRUE)
  return(dice)
}

one.dice() #what is happening here?? try this several times.
## [1] 5
#what is hapening here?

par(mfrow=c(2,2))

for (i in 1:4){
sims <- replicate(100, one.dice())
table(sims)
table(sims)/length(sims)
plot(table(sims), xlab = 'Event', ylab = 'Frequency')
}

Rolling a die N=10000 times.

#what is hapening here?

par(mfrow=c(2,2))

for (i in 1:4){
sims <- replicate(10000, one.dice())
table(sims)
table(sims)/length(sims)
plot(table(sims), xlab = 'Event', ylab = 'Frequency')
}

Problem 2. Multinomial distribution and its Marginals

From the class example

Let’s say that Molly, Ryan and Mr.Bob are buying beer(x1) ,bread(x2) and coke(x3) with probabilities (3/5,1/5,1/5).

  1. What is the probability that only 1 of them will buy beer, 2 of them will buy Bread , none will buy coke ? Compare the result with theoratical probability.

  2. Do a simulation for this scenario and plot the marginal distribution of x1.

Problem 3: Discuss this with your class mates and comment on the results. What can you obeserve from each plot?

Helpful links to answer this question:

-> Contour plot also gives the densities.

https://blog.revolutionanalytics.com/2016/02/multivariate_data_with_r.html

-> Then we have these ellipses; the circular symmetric version of complex normal distribution. https://en.wikipedia.org/wiki/Elliptical_distribution

ellipse: https://en.wikipedia.org/wiki/Ellipse

-> http://cs229.stanford.edu/section/gaussians.pdf

This tells you how when correlation coefficient increases the distribution spread and how the ellipses look like. -> https://online.stat.psu.edu/stat505/book/export/html/636

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.3     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   2.0.1     ✓ forcats 0.5.1
## Warning: package 'ggplot2' was built under R version 4.1.1
## Warning: package 'readr' was built under R version 4.1.1
## Warning: package 'stringr' was built under R version 4.1.1
## Warning: package 'forcats' was built under R version 4.1.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(mvtnorm)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(MASS)
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:plotly':
## 
##     select
## The following object is masked from 'package:dplyr':
## 
##     select
library(ggplot2)

Source: https://data-se.netlify.app/2018/12/13/visualizing-a-multivariate-normal-distribution/

Simulate multivariate normal data

First, let’s define a covariance matrix Σ:

sigma <- matrix(c(4,2,2,3), ncol = 2)
sigma
##      [,1] [,2]
## [1,]    4    2
## [2,]    2    3

Then, simulate observations n = n from these covariance matrix; the means need be defined, too. As the rank of our covariance matrix is 2, we need two means:

means <- c(0, 0)
n <- 1000

set.seed(42)
x <- rmvnorm(n = n, mean = means, sigma = sigma)
str(x)
##  num [1:1000, 1:2] 2.314 1.053 0.716 2.848 3.839 ...

Let’s make a data frame out of it:

d <- data.frame(x)
names(d)
## [1] "X1" "X2"

Plotting

Plotting univariate (sampled) normal data

d %>% 
  ggplot(aes(x = X1)) +
  geom_density()

Plot theoretic normal curve

p1 <- data_frame(x = -3:3) %>% 
  ggplot(aes(x = x)) +
  stat_function(fun = dnorm, n = n)
## Warning: `data_frame()` was deprecated in tibble 1.1.0.
## Please use `tibble()` instead.
p1

Plotting multivariate data

2D density

p2 <- ggplot(d, aes(x = X1, y = X2)) +
  geom_point(alpha = .5) +
  geom_density_2d()

p2

Geom binhex

p3 <- ggplot(d, aes(x = X1, y = X2)) +
  geom_point(alpha = .5) +
  geom_bin2d() +
  scale_fill_viridis_c()

p3

2D histogram (heatmap) with plotly

(p <- plot_ly(d, x = ~X1, y = ~X2))
## No trace type specified:
##   Based on info supplied, a 'scatter' trace seems appropriate.
##   Read more about this trace type -> https://plotly.com/r/reference/#scatter
## No scatter mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode
add_histogram2d(p)

2D cntour with plotly

add_histogram2dcontour(p)

3D plot

Surface

dens <- kde2d(d$X1, d$X2)

plot_ly(x = dens$x,
        y = dens$y,
        z = dens$z) %>% add_surface()

3D Scatter

First, compute the density of each (X1, X2) pair.

d$dens <- dmvnorm(x = d)

Now plot a point for each (X1, X2, dens) tuple.

p4 <- plot_ly(d, x = ~ X1, y = ~ X2, z = ~ dens,
              marker = list(color = ~ dens,
                            showscale = TRUE)) %>% 
  add_markers()

p4